Sentiment Analysis of Tunisian Dialects: Linguistic Ressources and Experiments
نویسندگان
چکیده
Dialectal Arabic (DA) is significantly different from the Arabic language taught in schools and used in written communication and formal speech (broadcast news, religion, politics, etc.). There are many existing researches in the field of Arabic language Sentiment Analysis (SA); however, they are generally restricted to Modern Standard Arabic (MSA) or some dialects of economic or political interest. In this paper we focus on SA of the Tunisian dialect. We use Machine Learning techniques to determine the polarity of comments written in Tunisian dialect. First, we evaluate the SA systems performances with models trained using freely available MSA and Multi-dialectal data sets. We then collect and annotate a Tunisian dialect corpus of 17.000 comments from Facebook. This corpus shows a significant improvement compared to the best model trained on other Arabic dialects or MSA data. We believe that this first freely available12 corpus will be valuable to researchers working in the field of Tunisian Sentiment Analysis and similar areas.
منابع مشابه
Language a Phenomenon of Culture and Communication, Preserving the Language through a phonetic Map
The language is the key instrument by which we assimilate the culture of our country. As culture is influenced by various factors related to history, traditions, etc. language is also affected by these phenomena. Their influence may differ from one region to another; this can produce specific regional language known as dialects. The dialect encloses lexical, syntactic or phonetic particularitie...
متن کاملA Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملA Review on Challenging Issues in Arabic Sentiment Analysis
Corresponding Author: Ali Hamdi Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, Johor, Malaysia Email: [email protected] Abstract: Understanding what people think about an idea or how they evaluate a product, a service or a policy is important for individuals, companies and governments. Sentiment analysis is the process of automatically identifying opinions ex...
متن کاملCollaboratively Constructed Linguistic Resources for Language Variants and their Exploitation in NLP Application - the case of Tunisian Arabic and the Social Media
Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dialects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially between MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for the translation of texts of soc...
متن کاملLinguistic Audit as a Professional Activity
The subject of this research is linguistic (or: language) audit. The term is new and not being widely used so far. Linguistic audit, in particular, is offered as a service of linguistic-consulting agencies’ activities. Modern linguistic consulting, according to the author, is a form of stimulating theoretical and practical development of linguistic ecology, a new branch of applied linguistics, ...
متن کامل